14-9 Exercise: QBSH via Optimum Assignment of Singing Pitch to Music Notes

Note that the description of this homework may be lengthy, but the concept is pretty straightforward. If you have questions about this homework, please come to our TA's office hours.

QBSH via Optimum Assignment of Singing Pitch to Music Notes

Outlines

Problem definition

Dynamic programming (DP) is a very important and useful method for sequence comparision. It has been used extensively in speech recognition, natural language processing, music analysis, and a variety of other areas. In this homework, we shall explore the use of DP for query by singing/humming (QBSH), which is an interesting paradigm of music information retrieval. (For simplicity, we shall assume our QBSH system always starts the comparison from the beginning of a song.)

In particular, you need to find the alignment between the pitch vector of human's singing and the note vector of a given melody, both in the unit of semitone or MIDI number. The correspondence between MIDI numbers and piano keyboard can be found here. Moreover,

Given a pitch vector and a note vector, your mission is to find an optimum mapping to assign each pitch point to a music note, such that the total distance is minimized. To be more specific, let's use the following notation:

The alignment path can be denoted by its coordinates, that is, $\{(1, u_1), (2, u_2), ..., (m, u_{m})\}$, where $u_k, k=1, 2, \cdots, m$ are indices into the note vector with the following constraints:

For instance, the alignment path of the above figure is ${(1, 1),(2, 1),(3, 2),(4, 2),(5, 2),(6, 2),(7, 3),(8, 3),(9, 4),(10, 4),(11, 4),(12, 4)}$, with $m=12$ (the length of $p$) and $u_{m}=3$ (the last index of music note that have at least one pitch point being assigned to). In other words:

The distance between a pitch vector $p$ and note vector $q$ can then be defined as follows: $$ dist(p, q)=\min_{u_1, u_2, ... u_{m}}\sum_{i=1}^{m} |p(i)-q(u_i)|, $$ subject to $u_1=1$ and $u_1 \leq u_2 \leq u_3 \cdots \leq u_{m}$.

Approach

Based on the above description, we can solve the task using DP. The 3-step formula for DP can be described as follows:

  1. Optimum-value function $D(i,j)$ is defined as the minimum distance between $p(1:i)$ and $q(1:j)$.
  2. Recurrent equation for $D(i,j)$ is shown next, with $i>0$ and $j>0$: $$ D(i,j)=|p(i)-q(j)| +\min \left\{ \begin{matrix} D(i-1,j)\\ D(i-1, j-1)\\ \end{matrix} \right. $$ (See the local constraint in the previous figure.) Please derive the recurrent equation when $i=1$ or $j=1$ by yourself. The boundary condition is $D(1, 1)=|p(1)-q(1)|$.
  3. Answer to the original task: $$ dist(p, q) = \min_{1\leq j \leq n}D(m, j). $$

Input/output formats

You need to write a function myPvAlign.m with the following input/output format:
[minDistance, dpPath]=myPvAlign(p, q);
Note that

Requirements & suggestions

Slides and video

Slides and video

Test sets

In order to run the following test programs, you need to download the following two p files: Here we have three test sets:
  1. Example 1: myPvAlign01.m% Twinkle twinkle little star (id=8); p=[62.9 61.8 61.8 61.8 61.8 61.8 61.4 61.4 61.4 61.4 60.9 61.4 61.8 61.4 60.9 60.9 60.9 60.9 60.9 60.9 60.9 60.9 61.4 61.8 62.9 63.9 65.7 65.7 66.3 66.3 67 67 67 67 67.7 67.7 67.7 67.7 67 68.4 67 67 67 67 67 67 67 67 67 67 67 67 67.7 67.7 68.4 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 68.4 68.4 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 69.1 68.4 67.7 67 66.3 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 66.3 66.3 65.7 65.7 65.7 65.1 64.5 64.5 64.5 65.1 65.1 65.1 65.1 65.1 65.1 65.1 65.1 65.7 65.7 65.7 65.1 65.1 65.1 65.1 65.1 65.1 65.1 65.1 64.5 64.5 63.4 64.5 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.9 64.5 63.4 65.1 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.9 63.4 62.3 63.9 62.3 61.8 61.4 61.4 61.4 61.8 61.8 61.8 61.8 61.8 61.8 61.4 63.4 62.3 61.8 61.8 61.8 61.8 61.8 61.8 61.8 61.8 61.8 61.8 61.8 60.9 61.4 60.4 60 60.4 60 60.4 60.4 60.4 60.4 60.4 60 60 60 60 60 60 61.4]; q=[60 60 67 67 69 69 67 65 65 64 64 62 62 60 67 67 65 65 64 64 62 67 67 65 65 64 64 62 60 60 67 67 69 69 67 65 65 64 64 62 62 60]; showPlot=1; figure; [minDist, dpPath]=myPvAlignSP(p, q, showPlot); return % If you want to do playback, download SAP toolbox at http://mirlab.org/jang/matlab/toolbox/sap and proceed as follows. addpath d:/users/jang/matlab/toolbox/sap % You need to change this to add SAP toolbox to the search path. % Play back of the oringal note note=[60 32 60 32 67 32 67 32 69 32 69 32 67 64 65 32 65 32 64 32 64 32 62 32 62 32 60 64 67 32 67 32 65 32 65 32 64 32 64 32 62 64 67 32 67 32 65 32 65 32 64 32 64 32 62 64 60 32 60 32 67 32 67 32 69 32 69 32 67 64 65 32 65 32 64 32 64 32 62 32 62 32 60 32]; notePlay(note); % Playback of the singing pitch vector pvPlay(p); % Playback of the induced singing pitch vector pvPlay(inducedP); minDist=86.4 dpPath=[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207;1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 13 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14]

  2. Example 2: myPvAlign02.m% 10 little indian (id=5) p=[43.9 57.4 58.7 59 59.5 60 60 60.2 60.2 60 60.1 60 59.8 60.3 60.5 60.5 60.3 59.9 60.9 60.4 60.2 60 60.7 82.1 61.2 60.5 60.2 60.4 60.3 60.4 60.4 60.5 60.5 60.3 59.3 60.1 60.4 60.3 60 58.1 66.4 61.4 60.8 60.4 60.7 61.3 65.7 65.7 65.6 65.4 65.2 65 65 65.2 65.2 65.3 65.7 67.5 68 68.2 67.8 67.8 69.5 68.5 68.4 68.3 68 67.6 66.9 66.3 66.3 65.9 65.5 65 64.4 64.7 64.8 64.9 64.9 64.5 63.6 61.7 60.2 60 60.4 60.8 61.2 61.7 62 62.1 66.6 63.7 63.6 63.4 63.3 63.3 63.3 63.4 63.3 63.3 63.1 62.9 62.6 63.2 63.2 63.2 62.8 64.7 63.9 63.3 63.2 63.5 62.8 64.1 63.7 63.5 63.5 63.5 63.5 63.5 63.4 63 61.7 62 62.8 63.7 63.4 63.5 63.2 63.6 64.6 64.2 63.9 63.7 63.2 61.3 59.6 59.4 59.7 59.6 59.9 60.3 61.2 62.7 63.2 63.2 63.4 67.7 64.5 64 63.6 63.4 63.4 63.3 62.5 61.1 60.5 59.9 60.2 60 59 59.8 60.2 60 60.2 60 59.1 57.6 57 57.2 83 60.7 60 60.2 60.3 60.1 60.2 61 61 60.8 60.5 60.6 60.7 61 60.6 60.2 60.2 59.3 61.4 60.9 60.5 60.2 60.3 60.2 59.7 59.5 60.2 60.3 60 59.9 59.8 60.2 60 60.4 60.8]; q=[60 60 60 60 60 60 64 67 67 64 64 60 62 62 62 62 62 62 59 62 62 59 59 55 60 60 60 60 60 60 64 67 67 64 64 60 67 65 65 64 64 62 60]; showPlot=1; figure; [minDist, dpPath]=myPvAlignSP(p, q, showPlot); return % If you want to do playback, download SAP toolbox at http://mirlab.org/jang/matlab/toolbox/sap and proceed as follows. addpath d:/users/jang/matlab/toolbox/sap % You need to change this to add SAP toolbox to the search path. % Play back of the oringal note note=[60 26 60 13 60 13 60 26 60 13 60 13 64 26 67 13 67 13 64 13 64 13 60 26 62 26 62 13 62 13 62 26 62 13 62 13 59 26 62 13 62 13 59 13 59 13 55 26 60 26 60 13 60 13 60 26 60 13 60 13 64 26 67 13 67 13 64 13 64 13 60 26 67 26 65 13 65 13 64 13 64 13 62 26 60 26]; notePlay(note); % Playback of the singing pitch vector pvPlay(p); % Playback of the induced singing pitch vector pvPlay(inducedP); minDist=245.1 dpPath=[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205;1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 14 15 16 17 18 19 19 19 19 19 19 19 20 21 22 22 23 24 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25]

  3. Example 3: myPvAlign03.m% Happy birthday to you (id=21) p=[57.1 58 58.5 59.5 60.1 60.4 60.1 60.4 60.1 59.5 59.8 59.5 59.8 60.1 59.2 62.6 62.6 62.6 62.2 61.8 61.5 61.5 61.5 62.2 62.2 62.6 62.2 61.8 61.5 61.1 58.9 58 58.3 58.5 59.5 60.1 60.1 59.8 60.1 60.4 60.4 59.8 59.8 59.8 64.1 65 65 65.4 65.4 65.4 65.4 65.4 65.4 65.4 65.4 65 65 64.1 64.1 64.6 65.4 65.4 65.9 65.4 65 65 65 65 65 65 65 65 65 65 64.6 64.6 63.7 58.9 58.9 58.5 58.9 59.5 60.1 60.1 59.8 59.5 59.8 59.2 59.5 59.2 65.4 63.3 63.3 62.9 62.9 62.9 62.9 62.6 62.6 62.6 62.6 62.6 62.2 61.8 61.1 59.8 59.5 59.5 59.8 60.1 60.1 59.8 59.5 59.5 59.2 59.2 59.2 66.3 67.3 67.8 67.8 67.3 67.3 67.3 67.3 67.3 67.3 67.3 67.3 67.8 66.8 66.3 65.9 65.4 65.9 66.3 66.3 65.9 65.4 65.9 65.9 65.9 65.9 65.9 65.9 65.9 65.4 63.7 59.8 59.8 60.1 60.8 60.8 60.4 59.8 60.8 59.8 60.1 60.4 60.1 70 71.2 71.8 72.4 72.4 72.4 72.4 72.4 73.1 73.1 73.1 73.1 72.4 72.4 71.2 68.8 67.8 68.3 68.8 69.4 69.4 69.4 69.4 69.4 69.4]; q=[60 60 62 60 65 64 60 60 62 60 67 65 60 60 72 69 65 64 62 70 70 69 65 67 65]; showPlot=1; figure; [minDist, dpPath]=myPvAlignSP(p, q, showPlot); return % If you want to do playback, download SAP toolbox at http://mirlab.org/jang/matlab/toolbox/sap and proceed as follows. addpath d:/users/jang/matlab/toolbox/sap % You need to change this to add SAP toolbox to the search path. % Play back of the oringal note note=[60 29 60 10 62 38 60 38 65 38 64 77 60 29 60 10 62 38 60 38 67 38 65 77 60 29 60 10 72 38 69 38 65 38 64 38 62 77 77 70 29 70 10 69 38 65 38 67 38 65 38]; notePlay(note); % Playback of the singing pitch vector pvPlay(p); % Playback of the induced singing pitch vector pvPlay(inducedP); minDist=103.4 dpPath=[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185;1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 7 7 7 7 7 7 7 7 7 7 7 7 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13 13 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 16 16]


Audio Signal Processing and Recognition (音訊處理與辨識)